Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Deletions are prevalent in the genomes of SARS-CoV-2 isolates from COVID-19 patients, but their roles in the severity, transmission, and persistence of disease are poorly understood. Millions of COVID-19 swab samples from patients have been sequenced and made available online, offering an unprecedented opportunity to study such deletions. Multiplex PCR-based amplicon sequencing (amplicon-seq) has been the most widely used method for sequencing clinical COVID-19 samples. However, existing bioinformatics methods applied to negative control samples sequenced by multiplex-PCR sequencing often yield large numbers of false-positive deletions. We found that these false positives commonly occur in short alignments, at low frequency and depth, and near primer-binding sites used for whole-genome amplification. To address this issue, we developed a filtering strategy, validated with positive control samples containing a known deletion. Our strategy accurately detected the known deletion and removed more than 99% of false positives. This method, applied to public COVID-19 swab data, revealed that deletions occurring independently of transcription regulatory sequences were about 20-fold less common than previously reported; however, they remain more frequent in symptomatic patients. Our optimized approach should enhance the reliability of SARS-CoV-2 deletion characterization from surveillance studies. Finally, our approach may guide the development of more reliable bioinformatics pipelines for genome sequence analyses of other viruses.more » « lessFree, publicly-accessible full text available April 16, 2026
-
Hybridization events complicate the accurate reconstruction of phylogenies, as they lead to patterns of genetic heritability that are unexpected under traditional, bifurcating models of species trees. This phenomenon has led to the development of methods to infer these varied hybridization events, both methods that reconstruct networks directly, as well as summary methods that predict individual hybridization events from a subset of taxa. However, a lack of empirical comparisons between methods – especially those pertaining to large networks with varied hybridization scenarios – hinders their practical use. Here, we provide a comprehensive review of popular summary methods: TICR, MSCquartets, HyDe, Patterson’s D-Statistic (ABBA-BABA), D3, and Dp. TICR and MSCquartets are based on quartet concordance factors gathered from gene tree topologies and HyDe, Patterson’s D-Statistic, D3, and Dp use site pattern frequencies to identify hybridization events between sets of three taxa. We then use simulated data to address questions of method accuracy and ideal use scenarios by testing methods against complex networks which depict gene flow events that differ in depth (timing), quantity (single vs. multiple, overlapping hybridizations), and rate of gene flow (γ). We find that deeper or multiple hybridization events may introduce noise and weaken the signal of hybridization, leading to higher relative false negative rates across all methods. Despite some forms of hybridization eluding quartet-based detection methods, MSCquartets displays high precision in most scenarios. While HyDe results in high false negative rates when tested on hybridizations involving extinct or unsampled ghost lineages, HyDe is the only method able to identify the direction of hybridization, distinguishing the source parental lineages from recipient hybrid lineages. Lastly, we test the methods on a dataset of ultraconserved elements from the bee subfamily Nomiinae, finding possible hybridization events between clades which correspond to regions of poor support in the species tree estimated in a previous study.more » « less
An official website of the United States government
